DOMe: A deduplication optimization method for the NewSQL database backups

نویسندگان

  • Longxiang Wang
  • Zhengdong Zhu
  • Xingjun Zhang
  • Xiaoshe Dong
  • Yinfeng Wang
چکیده

Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey on Fragmentation for Deduplication in Backup Storage

In backup environments field deduplication yields major advantages. Deduplication is process of automatic elimination of duplicate data in a storage system and it is most effective technique to reduce storage costs. Deduplication effects predictably in data fragmentation, because logically continuous data is spread across many disk locations. Fragmentation mainly caused by duplicates from previ...

متن کامل

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, w...

متن کامل

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

One of the key advances in resolving the “big-data” problem has been the emergence of an alternative database technology. Today, classic RDBMS are complemented by a rich set of alternative Data Management Systems (DMS) specially designed to handle the volume, variety, velocity and variability of Big Data collections; these DMS include NoSQL, NewSQL and Search-based systems. NewSQL is a class of...

متن کامل

THE OPTIMIZATION OF LARGE-SCALE DOME TRUSSES ON THE BASIS OF THE PROBABILITY OF FAILURE

Metaheuristic algorithms are preferred by the many researchers to reach the reliability based design optimization (RBDO) of truss structures. The cross-sectional area of the elements of a truss is considered as design variables for the size optimization under frequency constraints. The design of dome truss structures are optimized based on reliability by a popular metaheuristic optimization tec...

متن کامل

Efficiently Storing Virtual Machine Backups

Physical level backups offer increased performance in terms of throughput and scalability as compared to logical backup models, while still maintaining logical consistency [2]. As the trend toward virtualization grows, virtual machine backups (a form of physical backup) are even more important, while becoming easier to perform. The downside is that physical backup generally requires more storag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2017